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Abstract: In this paper area-power efficient modulo 2 n +l multiplier is proposed. The result and one operand for the new 
modulo multipliers use weighted representation, while the other uses the diminished- 1. By using the radix-4 Booth recoding, 
the new multipliers reduce the number of the partial products to n/2 for even and (n+l)/2 for odd except for one correction 
term. According to our algorithm, the resulting partial products are added through inverted end around carry save adder 
into two operands, which are finally adder by a 2-stage n-bit adder containing 2:1 multiplexer. By using the purposed adder, 
the new multipliers reduce the area and power. The analytical and experimental result indicates that the new modulo 2 n +l 
multipliers, offer reduced power and more compact area among all the existing structures. 
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I. Introduction 

Residue number systems (RNS) reduce the delay of carries propagation, thus suitable for the implementation of 
high-speed digital signal processing devices. Some arithmetic operations, such as addition and multiplication, can be carried 
out more efficiently in RNS than in conventional two's complement systems. RNS has been adopted in the design of Digital 
Signal Processors (DSP), Finite Impulse Response (FIR) filters], image processing units, Discrete Cosine Transform (DCT) 
processors, communication components, cryptography, and other DSP applications . In recent years, efficient schemes for 
modulo multipliers have been studied intensively. Generally, modulo 2 n +l multipliers can be divided into three categories, 
depending on the type of operands that they accept and output: 

i. the result and both inputs use weighted representation; 

ii. the result and both inputs use diminished- 1 representation; 

iii. The result and one input use weighted representation, while the other input uses diminished- 1. 

For the first category, Zimmermann et al. [1] used Booth encoding to realize, but depart from the diminished- 1 
arithmetic, which leads to a complex architecture with large area and delay requirements. For the second category, Wang et 
al. [2] proposed diminished- 1 multipliers with -bit input operands. The multipliers use a non-Booth recoding and a zero 
partial-product counting circuit. The main drawback in this architecture was handling of zero inputs and results were not 
considered. 

Curiger et al. [3] proposed new modulo multipliers by using the third category. This architecture use ROM based 
look-up methods are competitive. The main drawback in this architecture increasing n-bit, they become infeasible due to 
excessive memory requirements. 

Jian et al. [4] also proposed for the third category architecture and reduce the memory requirement and speed up. 
The new architecture is based on n-bit addition and radix-4 booth algorithm, which is efficient and regular. We are replaced 
diminished- 1 modulo 2 n +l adder by 2-stage n-bit adder. 

The remainder of the paper is organized as follows: mathematical formulation of Diminished- 1 number 
representation computation of modulo multiplier is presented in Section II. The proposed structures are presented in Section 
III. Hardware and time complexity of the proposed structures are discussed and compared with the existing structures in 
Section IV. Conclusion is presented in Section V. 

II. Diminished - 1 Number Representation 

The modulo 2 n +l arithmetic operations require (n+1) bit operands. To avoid (n+l)-bit circuits, the diminished- 1 
number system [15] has been adopted. Let d[A] be the diminished- 1 representation of the normal binary number 

A G [0, 2 w ], namely 

d[A] = \A-l\ r+i (i) 

In (i), when A ^ 0, d[A] G [0,2" —1] ,is an n -bit number, therefore (n+1) -bit circuits can be avoided in this case. 
However, 

A = 0,d[A] = d[0] = \-l\ r+l =2" (ii) 

Is an (n+1) -bit number. This leads to special treatment for d [0]. The diminished- 1 arithmetic operations [15] are defined as 



d[-A] = d[A]Jfd[A] g [OX -1] (iii) 
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d[A + B] = \d[A] + d[B] + l\ r+i 



(iv) 



d[A-B]= d[A] + d[B] + l 
d[AB] = \d[A] x d[B] + d[A] + d[B]\ r+i 

= \d[A]xB + B-l\ r 



2 n +l 



2 n +l 



d[2\A] = iCLS(d[Alk) 



(v) 

(vi) 
(vii) 



d[-2\A] = iCLS(d[Alk) 



(viii) 



Where d [A] represents the one's complement of d[A]. In (vii) and (viii) iCLS (d[a], k) is the k -bit left-circular 
shift of in which the bits circulated into the LSB are complemented. 

III. PROPOSED ARCHITECTURE 

In the new modulo 2 n +l multiplication, the result and one input use weighted representations, while the other input 
uses diminished- 1 representation. Let d[A]=(a n a n .i...aia 0 )2 be the diminished- 1 representation of weighted A , B=(b n b n _ 

i...bib 0 )2 and P = |AxZ?| 2n+i = {p n _ x p n _ 2 ... ./? 0 ) 2 all be weighted one. According toradix-4 booth recording [15] the 

product can be written as 

K-l 



P= AxB 



2"+l 



i=0 



(ix) 



2 n +l 



Where 



And 



K 



K-l 
i=0 

f n/ 2, even 
\(n + l)/2,odd 



From (ix) it is clear that the architecture consists of the partial products generator (PPG), the correction tern 
generator (CTG), the inverted end-around-carry carry save adder (EAC CSA) and 2-stage n-bit adder. Based on this 
architecture, a solution which is more effective is proposed. 

The encoding scheme accordant with the radix-4 Booth recoding [4], the partial product generator (PPG) can be 
constructed with the well-known Booth encoder (BE) and Booth selector (BS). The different blocks used in PPG and EAC 
CSA are taken from [4]. 

In this paper, we modified BE block which take successive overlapping triplets (b 2i+1 b 2i b 2i+1 ) and encodes each as 

an element of the set {-2,-1, 0, 1 2}. Each BE block produces 3 bits: lx, 2x and Sign. The 3 bits along with the multiplicand 
are used to form partial products. 
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The CTG produces which has the form ( 0x i+l 0x. Oj^OjCq) with X i G {0,1} . Since the 2i-th bit X- is 1 

when the BE t block encodes 0, otherwise X. is 0, one XNOR gate accepting the lx and 2x bits of the block can generate 
the 2i-th bit X- . 

The inverted EAC CSA tree can reduce the Partial Products to two numbers. The CSA tree is usually constructed 
with full adders (FA).Then the final two numbers from the tree is passed through the 2-stage n-bit adder. The 2-stage n-bit 
adder is consisting of two ripple carry adder with C in =0 and C in =land one 2:1 multiplexer. The C out of first n-bit ripple carry 
adder is act as control signal to the multiplexer. The two n-bit sum of the ripple carry adder is given to the multiplexer. If 
C out =0 then the final sum is the sum where the C in =1 as shown in fig. (3). 
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Figure 2(a): Booth encoder 
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Figure 2(b): Truth table 
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Example: WfaeiL, n=8Jk£&^ i 1 ^ 77 ^^ n ^(157)ia, then 
X. Y, Sum- 1 , Sum-2 and Final Sum are S-bit -data 



Figure 3: 2-Stase S-bit adder with Multiplexer 



Example: When, n=8 ,Let A=(227)i 0 , B=(157)i 0 , then d[A]=(226)i 0 , A x B\ 
Example 

n=8,d[A]=(l 1 100010) 2 ,B=(1001 1 101) 2 ,a 8 =0,b 8 =0 
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IV. Result And Simulation 

The proposed architecture has very low hardware complexity compared to [4], which consist of modulo 2 n +l adder. 
In the proposed architecture, we use the 2-stage inverted n-bit adder. And calculate the output for 8, 12, 16-bit. To estimate 
the timing, area and power information for ASIC design, we have used Synopsys Design Compiler to synthesize the design 
into gate Level. 

Comparison of Synopsys result in the proposed architecture and diminished- 1 modulo 2 n +l architecture is given in 
Table 1 and Table 2 respectively. 

These improvements are reasonable. When compared with Diminished- 1 modulo 2 n +l multipliers for weighted 
representation; the blocks of the new multipliers are based on inverted n-bit adder architecture and use area-power efficient 
in n-bit adders. 



Table 1: Synopsys Result for Area 



Area(um ) 


Multiplier 


8 bit 


12 bit 


16 bit 


Proposed 


4755.2651 


8984.3446 


15124.7143 


Jian et al[4] 


4901.5240 


9127.5707 


15370.098 



Table 2: Synopsys Result for Power 



Power at 50Hz(uW) 


Multiplier 


8 bit 


12 bit 


16 bit 


Proposed 


13.6532 


15.6768 


29.0434 


Jian et al[4] 


14.2816 


16.2569 


30.0773 



V. Conclusion 

In this paper, we proposed the area-power efficient a modulo 2 n +l multiplier. This architecture uses 2-stage n-bit 
adder, Booth recoding which reduces the number of the partial products to n/2 for even and (n+l)/2 for odd, this is the least 
number of the partial products among all modulo multipliers published. The reduction scheme uses the well-known inverted 
EAC CSA tree and the final 2-stage inverted n-bit adder generates the result. The circuit to handle the zero -input case is 
merged into the first Booth encoder and there is no extra delay to be added. The new multipliers, compared to existing 
implementations, offer better power while being more compact and their regular structure allows efficient VLSI 
implementations. 
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